-
Notifications
You must be signed in to change notification settings - Fork 14
Forge JIT Backend Integration #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This reverts commit c84d01c.
|
Hi @auto-differentiation-dev, thanks for the thorough review! I refactored the benchmark tests quite a bit - hope I incorporated everything as wished. Building QL now 3 times (native double, XAD (JIT=OFF), XAD (JIT=ON)) and creating a combined report: |
|
Hi @auto-differentiation-dev , this PR is now in a state for another round of review. I put the overhead workflow into a new PR that is checked out from this PR's branch and I'll follow up on it ( #37) once this one is merged. Best, Daniel |
|
Hi @da-roth, Sorry it took us some time to come back on this, and thank you for all the work - it is taking good shape. For XAD, we think only the XAD-split mode should be reported. This is the practical way to run Monte Carlo with XAD using path-wise derivatives, and it is what we would encourage users to adopt (see https://auto-differentiation.github.io/faq/#quant-finance-applications Looking at the results, we noticed a few patterns in the reported timings that seem unexpected and would be good to clarify. Overall, all methods show the expected linear behaviour - a fixed overhead plus a component that grows linearly with the number of paths. That said, the relative behaviour between methods looks unusual. In particular:
This makes us wonder whether there might be something off in how the timings are being measured or attributed. It would be helpful to understand how you interpret these trends. Thanks again, and happy to discuss further. |
fixes use original evolve for xad-split tried fixes first try added local script for fd and running benchmark locally
|
Hi @da-roth, Looking at the numbers again, maybe it is just a matter of increasing the Monte Carlo workload so runtime isn't dominated by bootstrapping. That would better reflect a real-world setup/application. For example, we could include a portfolio of swaptions and additional cashflows. |
|
Hi @auto-differentiation-dev , did some investigations, your remarks and intuitions were right. The QL code did some nasty re-computations of matrices during each step in the MC simulation. The code that implements this example is not that optimal - but being so, it seems to be a really good working example for future works. It shows the impact of overhead between double - AReal, and the latest results also indicate where Forge is still not that optimal. Anyway, I did some minor optimization such that the matrix is only computed once per path and see this locally:
Timings AReal with JIT = ON:
So AAD + AReal overhead (ofc implified by unoptimal implementation in QL) gives still a roughly 7.5x benefit for this example of XAD vs native double. Interesting is that XAD-Split is faster than JIT - I think it shows the benefit of how XAD is optimized and the overhead of doing unnecessary computations. The overhead of JIT vs JIT-AVX is not surprising me too much - I spent some time improving the throughput of setting inputs and getting the outputs. Hence we have something like 4 lanes + some infrastractural that can be applied to JIT scalar as well. I'll apply these in the future to the scalar JIT as well. Let's see how the benchmarks look in the cloud. Would you wish any changes here? I really like the example since it'll give us all the insights for future improvements, but ofc one could create something that gives us higher x compared to native FD if wished (more inputs, trying to further dig if we can avoid unnecessary computations etc.). Thinking out loud my intuition is that: While I'd expect that the better XAD performs compared to native FD, I think the nearer JIT scalar will get to XAD-split (and at some point be slightly faster as we saw in the XAD repo's results). Cheers, Daniel |
This reverts commit 4a47b2b.

This PR integrates the Forge JIT backend for XAD, adding optional native code generation support. Forge is an optional dependency - everything builds and runs without it.
Changes
Build options added:
Files added:
Files modified:
Benchmarks
The benchmark workflow (ql-benchmarks.yaml) runs swaption pricing benchmarks comparing FD, XAD tape, JIT scalar, and JIT-AVX methods on Linux and Windows.
Also included some initial work towards #33 - the workflow has type overhead jobs that compare double vs xad::AReal pricing performance (no derivatives) on the same hardware, providing a baseline for measuring XAD type overhead.
Example benchmark run (Linux) Link